dRAP-Independent: A Data Distribution Algorithm for Mining First-Order Frequent Patterns

نویسندگان

  • Jan Blatak
  • Lubos Popelínský
چکیده

In this paper we present dRAP-Independent, an algorithm for independent distributed mining of first-order frequent patterns. This system is based on RAP, an algorithm for finding maximal frequent patterns in first-order logic. dRAPIndependent utilizes a modified data partitioning schema introduced by Savasere et al. and offers good performance and low communication overhead. We analyze the performance of the algorithm on four different tasks: Mutagenicity prediction – a standard ILP benchmark, information extraction from biological texts, contextsensitive spelling correction, and morphological disambiguation of Czech. The results of the analysis show that the algorithm can generate more patterns than the serial algorithm RAP in the same overall time.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Data sanitization in association rule mining based on impact factor

Data sanitization is a process that is used to promote the sharing of transactional databases among organizations and businesses, it alleviates concerns for individuals and organizations regarding the disclosure of sensitive patterns. It transforms the source database into a released database so that counterparts cannot discover the sensitive patterns and so data confidentiality is preserved ag...

متن کامل

Mining Frequent Patterns in Uncertain and Relational Data Streams using the Landmark Windows

Todays, in many modern applications, we search for frequent and repeating patterns in the analyzed data sets. In this search, we look for patterns that frequently appear in data set and mark them as frequent patterns to enable users to make decisions based on these discoveries. Most algorithms presented in the context of data stream mining and frequent pattern detection, work either on uncertai...

متن کامل

Extending the Order Preserving Submatrix: New patterns in datasets

This paper concerns in finding local patterns in gene expression datasets. We present new order relation patterns, and develop algorithms which finds those pattern. Our algorithms are the first algorithms to find the exact results for those patterns, yet in most cases they outperforms existing heuristical algorithm. Finally we present an algorithm for the broader problem of frequent itemset min...

متن کامل

A New Algorithm for High Average-utility Itemset Mining

High utility itemset mining (HUIM) is a new emerging field in data mining which has gained growing interest due to its various applications. The goal of this problem is to discover all itemsets whose utility exceeds minimum threshold. The basic HUIM problem does not consider length of itemsets in its utility measurement and utility values tend to become higher for itemsets containing more items...

متن کامل

MUSK: Uniform Sampling of k Maximal Patterns

Recent research in frequent pattern mining (FPM) has shifted from obtaining the complete set of frequent patterns to generating only a representative (summary) subset of frequent patterns. Most of the existing approaches to this problem adopt a two-step solution; in the first step, they obtain all the frequent patterns, and in the second step, some form of clustering is used to obtain the summa...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Computing and Informatics

دوره 26  شماره 

صفحات  -

تاریخ انتشار 2007